Protein-protein interaction analysis of Alzheimer`s disease and NAFLD based on systems biology methods unhide common ancestor pathways

Aim: Analysis reconstruction networks from two diseases, NAFLD and Alzheimer`s diseases and their relationship based on systems biology methods. Background: NAFLD and Alzheimer`s diseases are two complex diseases, with progressive prevalence and high cost for countries. There are some reports on relation and same spreading pathways of these two diseases. In addition, they have some similar risk factors, exclusively lifestyle such as feeding, exercises and so on. Therefore, systems biology approach can help to discover their relationship. Methods: DisGeNET and STRING databases were sources of disease genes and constructing networks. Three plugins of Cytoscape software, including ClusterONE, ClueGO and CluePedia, were used to analyze and cluster networks and enrichment of pathways. An R package used to define best centrality method. Finally, based on degree and Betweenness, hubs and bottleneck nodes were defined. Results: Common genes between NAFLD and Alzheimer`s disease were 190 genes that used construct a network with STRING database. The resulting network contained 182 nodes and 2591 edges and comprises from four clusters. Enrichment of these clusters separately lead to carbohydrate metabolism, long chain fatty acid and regulation of JAK-STAT and IL-17 signaling pathways, respectively. Also seven genes selected as hub-bottleneck include: IL6, AKT1, TP53, TNF, JUN, VEGFA and PPARG. Enrichment of these proteins and their first neighbors in network by OMIM database lead to diabetes and obesity as ancestors of NAFLD and AD. Conclusion: Systems biology methods, specifically PPI networks, can be useful for analyzing complicated related diseases. Finding Hub and bottleneck proteins should be the goal of drug designing and introducing disease markers.

Introduction 1 Alzheimer's disease (AD) is a neurodegenerative disease that is one of the important disease in industrial AD showed that in addition to age and heredity, lifestyle is an important factor in the progression of this disease (5). On the other hand, lifestyle has an important role in producing some diseases such as obesity, diabetes and fatty liver (6). Fatty liver diseases divided into two forms : alcoholic fatty liver disease (AFLD) and Non-alcoholic fatty liver disease (NAFLD), that mainly occurs due to high using of Alcohol and fat (6). Non-alcoholic fatty liver disease (NAFLD) is one of the most important reasons for liver disease in the United States so that 30% of US population affected by NAFLD (7). Indeed, as well as AD, NAFLD depend on lifestyle and feeding. Our previous studies on AD that accomplished by meta-analysis in microarray data showed that NAFLD has an undoubted relation to AD (8). There are other studies about the relation of AD and NAFLD that focuses on some common genes (LRP1) (9), crosssectional study (10) and AD-Transgenic model (11). Protein-protein interaction (PPI) network analysis is one of the major fields in systems biology in which analyzed complex interactome of proteins as the main source of data (12). Using systems biology method such as a comparison between gene sets of diseases, constructing PPI network and pathway enrichment can be helpful to decipher the shared mechanism of NAFLD and AD. In this study, we reported seven important shared proteins between these diseases that can be used not only as markers of disease but also as targets for drug designing. Also, pathways that shred between these diseases were introduces.

Methods
DisGeNET is a discovery database that gathered genes and variants associated with human diseases and publicly available (13). The related genes of NAFLD and AD were exported from DisGeNET database and common genes between two diseases used to construct PPI network by Search Tool for the Retrieval of Interacting Genes/Proteins (STRING). STRING is a database for predicted protein-protein interactions at EMBL clusters the extracted results from many proteinprotein interactions databases, like Mint, BioGrid, etc. It also uses the information from KEGG pathways and Reactome to provide the best annotations for the interactions of one protein (14). The common network was constructed by importing shared genes in STRING database and clustered by ClusterONE plugin of Cytoscape software (15) that finds overlapping protein complexes in a protein interaction network loaded into Cytoscape. (overlap threshold = 1, node penalty = 0, haircut threshold = 0) (16). Pathway enrichment and the relation between pathways were accomplished using ClueGO and CluePedia plugins of Cytoscape software (17,18). To find the best centrality method for selection of the most important nodes, we use an R package named CINNA (19,20). A network is composed of nodes (e.g., genes or proteins) and edges/links (e.g., co-expression relationships or physical interactions). In network biology terms, degree, and Betweenness are important centrality parameters that are useful for analysis network topology. Edges/links of a node are called the degree of that node. Nodes with high degree are called hubs and nodes that achieve top-ten, or top-five percent of betweenness centrality are called bottlenecks (both based on researcher's definition) (21). So, nodes that are simultaneously hubs and bottlenecks are named hub-bottlenecks (22). Average degree (A.D) and standard deviation (SD) of degrees were calculated, and nodes with a degree above 2SD + A.D were selected as hub proteins in each network. Also, the top five percent of betweenness centrality measures were selected as bottleneck proteins. Shared genes, hubs and bottleneck proteins of these two networks were extracted and used for further analysis. We used Cytoscape to analyze networks and extract hubs, hub-bottlenecks, and their first neighbors (23).

Results
From DisGeNET, 332 and 1200 genes were extracted for NAFLD and Alzheimer`s diseases, respectively. Totally, 189 genes were shared between the two lists were shared and were named common genes. The common genes network that was constructed using STRING database has 182 nodes and 2591 edges and four clusters ( Figure 1). Cluster analyzing by ClueGO and CluePedia plugins showed that there are 29 meaningful pathways based on statistical analysis and there is no duplication between them. Cluster one mainly includes carbohydrate metabolism pathways and their related signaling, and the main category of this cluster was AMPK signaling pathway. In cluster two long chain fatty acid and their extract metabolic process include arachidonic acid, xenobiotics and calciferol were enriched. Finally, enrichment of cluster three lead to signaling pathways such as regulation of JAK-STAT cascade, IL-17 signaling pathway and AGE-RAGE signaling pathway in diabetic complications. Due to low number of nodes in cluster four, pathway enrichment was meaningless (table 1).
Based on CINNA package results, degree and Betweenness centrality methods were the best qualified methods for this network. In next step, the network was analyzed by Cytoscape to define hubs, hub-bottleneck. Results showed that IL6, AKT1, TP53, TNF, JUN, VEGFA, PPARG, MAPK3, IGF1, and LEP are hubs that first seven proteins were also bottlenecks, so Figure 1. Resulted network which is constructed by common genes between NAFLD and AD diseases is presented. This network includes four clusters that are highlighted by different colors. Cluster-1: orange, cluster-2: red, cluster-3: green, and cluster-4: gray.
selected as hub-bottlenecks (table 2). By extracting these hub-bottlenecks and their first neighbors from the network, we reach to a new interesting network that contains 82% of nodes (150 nodes) and 92% of edges (2367 edges) from the main network. So by analyzing them in OMIM as the main database for disease, we reached to diabetes and obesity (table 3).

Discussion
Systems biology methods such as PPI network analysis and pathway enrichment have been used broadly to discover main proteins and pathways underlying complex diseases (24). Different types of cancers, various kinds of neurodegenerative diseases and disorders and also many cellular conditions are analyzed via protein-protein interaction method (25-30) The relation between NAFLD and AD is becoming increasingly recognized (9)(10)(11). In this study, we used the complete genes list of the two diseases (NAFLD and AD) that may have shared mechanism based on risk factors and previous studies (8). According to network clustering and further pathways enrichment, 42 pathways were enriched. Altogether, three main group of pathways are candidate as key pathways in both AD and NAFLD: carbohydrate metabolism, long fatty acid metabolism, and IL signaling pathways. Previous studies indicate evidence about all mentioned relation except the role of long fatty acid metabolism in AD (8,(31)(32)(33)(34)(35)(36)(37). Six Hub-Bottleneck nodes are important targets for both NAFLD and AD. High level secretion of peripheral IL-6 may be responsible for acute-phase proteins that observed in AD patients (38) and high levels of IL-6 were detected in NAFLD patients (39). AKT activity in temporal cortex of Alzheimer patients were significantly increased (40) and activated the PI3-K/Akt kinase pathway triggers NAFLD (41). TP53 that known as P53, up-regulated in Alzheimer's disease (42) and inhibition of attenuates signs of NAFLD (43). Inhibition of TNF alpha decrease amyloid plaques and tau phosphorylation in the mouse brain, and so risk of AD (44), and this protein involved in the pathophysiology of NAFLD (45). Inhibition of JUN is a therapeutic strategy to stop progression of AD (46) and expression of this protein Increased in NAFLD (47). Abnormal regulation of VEGFA expression implicated in AD (48) and involved in pathophysiology of NAFLD (49). Finally, PPARG is a potential therapeutic targets for both AD and NAFLD (50,51).
Analyzing main nodes and their first neighbors by OMIM database showed that diabetes and obesity were the results of this enrichment. We can conclude that diabetes and obesity are common ancestors of AD and NAFLD. These results showed that application of systems biology methods unhide unravels the secret behind common mechanism of AD and NAFLD. The real impact of common proteins on treatment of NAFLD and AD also needs to be further assessed.